personalized alignment
A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications
Guan, Jian, Wu, Junfei, Li, Jia-Nan, Cheng, Chuanqi, Wu, Wei
Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their transition to real-world applications reveals a critical limitation: the inability to adapt to individual preferences while maintaining alignment with universal human values. Current alignment techniques adopt a one-size-fits-all approach that fails to accommodate users' diverse backgrounds and needs. This paper presents the first comprehensive survey of personalized alignment-a paradigm that enables LLMs to adapt their behavior within ethical boundaries based on individual preferences. We propose a unified framework comprising preference memory management, personalized generation, and feedback-based alignment, systematically analyzing implementation approaches and evaluating their effectiveness across various scenarios. By examining current techniques, potential risks, and future challenges, this survey provides a structured foundation for developing more adaptable and ethically-aligned LLMs.
PAD: Personalized Alignment of LLMs at Decoding-Time
Chen, Ruizhe, Zhang, Xiaotian, Luo, Meng, Chai, Wenhao, Liu, Zuozhu
Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In response, this paper presents Personalized Alignment at Decoding-time (PAD), a novel framework designed to align LLM outputs with diverse personalized preferences during the inference phase, eliminating the need for additional training. By introducing a unique personalized reward modeling strategy, this framework decouples the text generation process from personalized preferences, facilitating the generation of generalizable token-level personalized rewards. The PAD algorithm leverages these rewards to guide the decoding process, dynamically tailoring the base model's predictions to personalized preferences. Extensive experimental results demonstrate that PAD not only outperforms existing training-based alignment methods in terms of aligning with diverse preferences but also shows significant generalizability to preferences unseen during training and scalability across different base models. This work advances the capability of LLMs to meet user needs in real-time applications, presenting a substantial step forward in personalized LLM alignment. Recent advancements have demonstrated success in aligning language models with human preferences and values (Stiennon et al., 2020; Bai et al., 2022; Ouyang et al., 2022; Achiam et al., 2023). However, in this pluralistic world, users' preferences can diverge significantly based on their different cultures, educational backgrounds, religions, and political stands (Gordon et al., 2022; Sorensen et al., 2024b; Jang et al., 2023; Cheng et al., 2023). Furthermore, even for the same person, the preference of a particular LLM response can vary when the application scenario changes. Hence, there always exists a proportion of human preferences that cannot be unified by the general preference, also known as personalized preferences, which current alignment frameworks struggle to align with due to the need for high-quality datasets and substantial computational costs in policy optimization. How can we align with personalized preferences without the need for additional data collection and policy training? In this paper, we introduce Personalized Alignment at Decoding-time (PAD), which aims to align LLM outputs with diverse personalized preferences during the inference phase without requiring additional training. To achieve this, we first propose a personalized reward modeling strategy, which decouples the text generation process (modeled as a Markov Decision Process) from personalized preferences, thereby enabling the acquisition of generalizable token-level personalized rewards.